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Affine and Regional Dynamic Time Warping 

Tsu-Wei Chen, Meena Abdelmaseeh, and Daniel Stashuk 


Abstract —Pointwise matches between two time series are of great importance in time series analysis, and dynamic time warping 
(DTW) is known to provide generally reasonable matches. There are situations where time series alignment should be invariant to 
scaling and offset in amplitude or where local regions of the considered time series should be strongly reflected in pointwise matches. 
Two different variants of DTW, affine DTW (ADTW) and regional DTW (RDTW), are proposed to handle scaling and offset in amplitude 
and provide regional emphasis respectively. Furthermore, ADTW and RDTW can be combined in two different ways to generate 
alignments that incorporate advantages from both methods, where the affine model can be applied either globally to the entire time 
series or locally to each region. The proposed alignment methods outperform DTW on specific simulated datasets, and 
one-nearest-neighbor classifiers using their associated difference measures are competitive with the difference measures associated 
with state-of-the-art alignment methods on real datasets. 

Index Terms —Pattern recognition, time series, algorithms, alignment, similarity measures. 
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1 Introduction 

A time series is a sequence of values that are typically 
arranged in a chronological order, and data of such 
form is abundant in everyday life. Discovery of a set of 
matches between points in two time series can be tremen¬ 
dously useful for analysis. If point a in time series s is of 
high interest, researchers may also be interested in finding 
the point that best matches point a in another time series t. 
Points of interest include hints of financial meltdown from 
stock market prices, regulatory genes from gene expression 
data and earthquake activities from seismic data. 

Dynamic time warping (DTW) is a method that matches 
points in two time series based on the assumption that non¬ 
linear temporal variations exist. Figure [^illustrates a purely 
vertical alignment (an alignment is defined to be a set of 
matches) and the DTW alignment of two time series subject 
to non-linear temporal variations, where a match between 
two points from different time series is illustrated by con¬ 
necting the two points with a line. The DTW alignment 
is much more visually consistent. This work focuses on 
DTW, because it is a widely known method for aligning two 
time series that has enjoyed success in many domains. In 
addition, a comprehensive survey demonstrated that DTW 
outperforms many other methods across many applications 
m. However, DTW can produce pathological alignments, 
which are identified based on a context wherein the identi¬ 
fier has a particular model in mind. As a result, there has 
been a large influx of methods based on DTW to realize 
particular models for obtaining better alignments under 
specific contexts p]. 

In this work, two alignment methods that add specific 
models to DTW are proposed: affine DTW (ADTW) and 
regional DTW (RDTW). ADTW models one time series as 
an amplitude-scaled and offset-biased version of another 
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A. Vertical alignment 



B. DTW alignment 



Fig. 1: Vertical vs DTW alignment. 


time series during alignment. It tries to find the best scal¬ 
ing, offset and alignment simultaneously. Numerous types 
of time series fall under this affine model. For example, 
temperature and humidity data can be subject to different 
scalings, offsets and temporal variations depending on the 
geographic location and environment. An unintuitive align¬ 
ment produced by DTW for two temperature time series 
subject to scaling, offset and temporal variation is illus¬ 
trated in Figure [^, where DTW matches a large number 
of points in one time series to the peak in another time 
series. While normalizing s and t before applying DTW 
can alleviate this undesired behavior to some extent, ADTW 
nonetheless provides the most visually consistent alignment 
(see Figure [^ and C). ADTW is a simplified version of 
the method proposed in ||^, where scaling, offset, rotation, 
and shear/squeeze mappings are imposed on images when 
applying DTW. ADTW only models scaling and offset, so 
its alignment will not be confused by modeling of rotation 
and shear/squeeze mappings for time series that cannot 
undergo such transformations. 

Scenarios may arise where there are local regions in a time 
series reflective of components of interest, so they should 
be emphasized to find a more desirable alignment between 
two time series. Regional DTW (RDTW) is proposed to 
accommodate this scenario by substituting the pointwise 
distance in DTW with a regional distance. Many types of 
time series contain components of interest that should be 
focused on. For example, a motor unit potential (MUF) is 
the ensemble summation of several muscle fiber potentials 
(MFFs), and their analysis is crucial to determining the 
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Time series s: Time series t; 

temperature data in Sherbrooke temperature data in Resolute 




Fig. 2: DTW vs ADTW alignment of daily temperature 
across a year in Sherbrooke and Resolute, Canada. 


characteristics of the MUR In Figure each MUR (time 
series s and t respectively) is the ensemble summation of 
two MFRs. Each MFR can be shifted in time by a different 
amount and can thus be subject to different degrees of 
overlap. DTW produces a bad alignment where a large 
portion of the leftmost MFR in t is matched to the rightmost 
MFR in s (see Figure ^). In contrast, RDTW aligns the 
constituent MFR contributions in a more desirable manner 
(see Figure]^). 


A. UrxJertying MFP components B. DTW C. RDTW 



Fig. 3: DTW vs RDTW alignment of two MUFs with differ¬ 
ent degrees of MFP overlap. 


ADTW and RDTW can be combined to include both affine 
modeling and emphasis on local regions. Two different 
ways of combining ADTW and RDTW (one in a global 
manner and one in a local manner) are proposed. Global- 
affine RDTW (GARDTW) models one time series as a scaled 
and offset version of another time series when aligning 
them with regional emphasis. For example, the amount 
of rainfall over time is subject to different scalings, offsets 
and temporal variations across different locations. In the 
analysis of this rainfall data, placing an emphasis on sections 
with short but heavy amounts of rainfall can be useful in 
predicting such behaviors. The preference for this example 
is to emphasize on matching components reflective of short 
but heavy amounts of rainfall within two time series that 
can undergo scaling, offset and temporal variation. Local- 
affine RDTW (LARDTW) emphasizes on regions when per¬ 
forming an alignment, where each region in one time series 
is modeled as a scaled and offset version of the respective 
matched region in another time series. Revisiting the MUR 
example, the same MFP can have different scalings caused 
by slight electrode movement. The objective in this example 
is to correctly align the MFRs that can undergo different 
scalings within two MUPs. 

The rest of this paper is organized as follows. Section 2 
provides a review for the technical details of DTW, and 
the proposed methods of ADTW, RDTW, GARDTW and 


LARDTW are described in detail. Section 3 covers evalu¬ 
ations of the alignments and difference measures generated 
by the proposed methods. Linally, Section 4 concludes this 
paper. All figures and results in this paper can be easily 
reproduced using the publicly available code at |4). 


2 Alignment Methodology 
2.1 Notation 

Let s = (si, S 2 , s„) e R” and t = (U, f 2 , tm) G R™ be 

two time series of interest. Also, let p represent a sequence 
of matched points between s and t, where 

p = {p(l) = (oi, 6i),p(2) = (a2, &2), 

P{\P\) = (aiph^bl)} 

and (flfe, bk) G means that point Sq^ is matched to point 
■ In addition, let d be a difference measure between two 
points, and d is assumed to be the squared difference unless 
mentioned otherwise. 


2.2 DTW Review 

DTW is a method that matches points in two time series that 
are subject to non-linear temporal variations. Lor a pair of 
time series s and t, DTW searches for an optimal alignment 
p* among all possible alignments p Gf such that 

IpI 

D{s,t,p) = ^d(Sa,,f&J 

k=l 

is minimized subject to the following constraints: 

• Boundary: p(l) = (1,1) and p{\p\) = (n, m). 

• Monotonicity: If p{k) = (a, b) and p{k -|- 1) = (c, d), 
then c> a and d> b Vk. 

• Step Size: If p{k) = (a, b) and p{k -F 1) = (c, d), then 
c — a < 1 and d — b < 1 \/k. 

Lor simplicity, the boundary, monotonicity and step size 
constraints will be jointly referred to as the DTW constraints. 
D{s,t,p*) will also be referred to as the DTW difference 
measure. 

Dynamic programming is effective in reducing the time 
complexity for finding the optimal alignment p* to this 
constrained optimization problem, because p* has optimal 
substructures and there are overlapping subproblems. A 
solution can be formulated using these properties. Let p*^ 
be the optimal alignment for (si, S 2 , So) and (fi, ^ 2 , tb) 
subject to the DTW constraints. Lirst, a table of D{s, t,p*- ■^) 
values is constructed for all 1 < f < n and 1 < j < m where 
i is the row position and j is the column position. This table 
is referred to as the DTW table, and p* can be found after 
building this table. The DTW table can be updated starting 
from the first row {i = 1) from left to right (j = 1 to j = m), 
and then the next row {i = 2) can be filled from left to right 
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as well until the n* row is reached. The update formula is 
as follows: 

D{S, f,P(a,f,-l)), D{S, 

( 1 ) 

After constructing the DTW table, a backtracking pro¬ 
cedure can be applied on the DTW table starting from 
D{s, to generate the optimal alignment p*. 


Algorithm 1 ADTW 

1: p\c\e\c'-Q t- l,e[) ^ 0, D^^prev t- oo,u ^ 1 

2: while 1 do 

3: p{ ^ argmin DA{s,t,p, e^_i)constr. 

P 

4: {cl, el) ^ argmin Da(s, c, e)constr. 

c,e 

5: if Da ,prev DA{s,t,p\c’',e^) < Atop then 

6 : p''^ pI,c‘^ cl,e'-^ el 

7: break 

8: U t— U + 1 


Additional constraints can be placed on alignments to 1) 
eliminate pathological alignments, and 2) reduce time and 
space complexity. A well-known constraint is the Sakoe- 
Chiba band, where Si can only be matched to points from 
the set and Wq G. 

Z>o. The width of the band u>f, is given by uife = l-|-2u>q,and 
it is recommended to be tuned based on specific problems 

i)- 


In Algorithm pi is obtained by applying DTW on s and 
cl_it + el_i, and {cl, el) are computed with the following 
equations after setting p = pi: 


c„ = 


y^|p| s 
2^k=l *a 




IpI 


IpI 


Y^IpI -1.2 

2^k=l 


bk 




( 2 ) 


For simplicity of complexity analysis, let us assume that 
n = m. Since the update formula in Equation for filling 
each element in the DTW table is 0(1) in time, filling the 
entire table requires a time complexity of 0{wbn) under the 
Sakoe-Chiba band. The space complexity for constructing 
the DTW table is 0{wbn) as well. The longest path that 
can be obtained from backtracking is 2n — 1, so backtrack¬ 
ing is 0{n) in time and space. Hence, the total time and 
space complexities for finding the optimal alignment p* are 
0{wbn). 


2.3 Affine DTW 

Affine DTW (ADTW) increments DTW to allow arbitrary 
scaling and offset in amplitude between two time series 
subject to temporal variations. In ADTW, s is assumed to 
be a scaled and offset version of t with temporal variations. 
In more formal terms, the goal is to find a path p*, scaling 
c* S R and offset e* S R that minimize 

IpI 

DA{s,t,p,c,e) = ^d{sak,ctbk +e) 
k=l 

subject to the DTW constraints. For brevity, Da{s, t,p, c, e) 
subject to the DTW constraints will be referred to as 
DA{s,t,p, c,e)comitr.- This formulation aims to optimize for 
a global minimum in Da{s, t,p, c, e)constr. with respect to p, 
c and e simultaneously, which is different from finding the 
scaling and offset first prior to applying DTW to obtain an 
alignment. 


1 

= ^^{Sak - cltb,D (3) 

The above equations can be derived in a manner similar to 
linear least squares. The scaling cl can be constrained to 
exist between Cmm and Cmax to avoid improbable scalings. In 
addition, Da{s, t,p'', c\ e*) will be referred to as the ADTW 
difference measure. Note that a more general version of 
ADTW where each prespecified subset of points has its own 
scaling and offset can be solved in a similar way. 

Assuming n = m, each iteration in Algorithm takes 
0{wbn) time and space to run DTW to obtain pi. Looking 
at Equation 1^ and computing {cl, el) is 0{n) in time 
and 0(1) in space. Hence, ADTW is 0{ncWbn) in time and 
0{wbn) in space, where Uc is the number of iterations for 
convergence. 


2.4 Regional DTW 

Regional DTW (RDTW) modifies DTW to place more weight 
in a region of points potentially representative of a com¬ 
ponent of interest in a time series. This is accomplished 
by substituting the pointwise distance measure d with a 
distance dr that measures the difference between points 
in a region. Let Wr = + 2wh G Z>i be the region 

width to consider. Then, RDTW finds an alignment p* that 
minimizes 

IpI 

Dji{s,t,p,Wh) = ^dr{Sa^,h^,Wh) 


Finding {p*,c*,e*) is too computationally expensive be¬ 
cause dynamic programming can no longer be ap¬ 
plied, so hard expectation-maximization (EM) is used 
to find a suboptimal solution (p*, c*, e*) in Algorithm 
Hard EM guarantees that Da(s, e(,+i)constr. < 

Da{s, t,pl,cl,el)constr., and it converges to a local optimum 
at a linear rate when certain conditions are fulfilled ||^. 


subject to the DTW constraints, where 

^ Wh 

l<a+iti<n 

l<fe+iu<m 

and Wa,b is the number of distances added in the above 
summation. Dynamic programming can be utilized in the 
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same manner as DTW, where the update formula is as 
follows: 

= dr{sa,tb,Wh) + 

Dr{s, t,pl^ i,_-^^,Wh),Dii{s, w?,)) (4) 

The same DTW techniques can be used to construct 
the RDTW table and obtain the optimal alignment p*. 
Dii{s, t,p*,Wh) is referred to as the RDTW difference mea¬ 
sure. 

Assuming n = m, the RDTW table requires Oiwbn) ele¬ 
ments to be filled with the update formula. It turns out 
that most elements in the RDTW table can be updated 
with a time complexity of 0(1) instead of 0{wr) using the 
following observation: 

— [ d(^Sa—wh — ^i^b—Wh — l^~^ 

Wa,b 

R^a—l,b—ldr (Sa—Ijffo—T- d(^Sa-\-wh ,: ^b-\-Wh )] 

This observation is only applicable when Sa_i and tb-i ex¬ 
ist, which corresponds to Wb{n— 1 ) elements. The remaining 
Wb elements take 0{wr) time, so the total time complexity 
is 0 {wb{n — 1)) -I- 0 {wbWr) = 0 {wbn) because Wr < n. The 
total space complexity is also 0{wbn). 

So far, the effects of the region width Wr have not been 
discussed, which is crucial to achieving good results. RDTW 
with different w/s is applied to the same MUP alignment 
example from the introduction in Figure Highly variable 
alignments are observed across the different widths, and the 
alignment is most reasonable when ^ = 0.05. Setting Wh 
by searching for the value that offers the best result based 
on a target evaluation criterion is proposed. 



Fig. 4: DTW alignments with different region widths. 


Dc{s,t,p,c,e,Wh) subject to the DTW constraints will be 
referred to as Dc{s,t,p,c,e,Wh)constT.- Similar to ADTW, 
finding {p *, c*, e*) is not computationally feasible, and in¬ 
stead a suboptimal solution {p^,c^,e^) using hard EM is 
sought in Algorithm]^ p® is obtained by applying RDTW on 
s and cl_it + e®_i, and (eg, eg) is computed with equations 
in Appendix [a] These equations can be derived by proving 
convexity of Dc and setting its derivative with respect to 
c and e to zero. Dc{s,t,p^,Wh) will be referred to 
as the GARDTW difference measure. From Algorithm 
and assuming n = m, GARDTW is 0{ncWbn) in time and 
0{wbn) in space, where ric is the number of iterations to 
convergence. Similar to ADTW, the scaling eg can be con¬ 
strained to exist between Cmin and Cmax to avoid improbable 
scalings. 


Algorithm 2 GARDTW 

1: P®,c9,e9,cg ^ l,eg ^ 0, ^ 00 ,U ^ 1 

2: while 1 do 

3: pg ^ argminDG(s,f,p, cg_i,eg_i,w?,) 

constr. 

P 

4: (eg, eg) ^ argmin Dais,t,pS,c,e,Wh) constr. 

c,e 

5: if Dc.prev “ Dg{s, t,p0 , c®, e^) < Aiop then 

6: p9 ^pg,c9 ^ cg,eS ^ eg 

7: break 

8: U U -I- 1 


GARDTW is illustrated in Figure]^ and it provides a better 
alignment than ADTW and RDTW by modeling both scaling 
and regional emphasis. 


A. ADTW on s arxJ I B. RDTW on s and I E. GARDTW on s and I 



Fig. 5: ADTW, RDTW and GARDTW on time series scaled 
differently with component overlap. 


2.6 Local-Affine RDTW 


2.5 Global-Affine RDTW 

In global-affine RDTW (GARDTW), one time series is mod¬ 
eled as the scaled and offset version of another time series 
when aligning them with regional emphasis. Formally, the 
goal is to find a path p*, scaling c* and offset e* that 
minimize 

IpI 

DG{s,t,p,c,e,Wh) = ^dg{saf,,h^,c,e,Wh) 
k^l 

subject to the DTW constraints, and 

dg ) C, 6, 

^ Wh 

— ^ ^ d{Sah-\-w ^ ctbh-\-w 

W = — Wh 
w.l<ak-\-w<n 
w:l<bk-\-w<m 


In local-affine RDTW (LARDTW), the region surrounding 
each point is assumed to be a scaled and offset version 
of another region surrounding the corresponding matched 
point. Formally, LARDTW finds an alignment p* that mini¬ 
mizes 

bl 

DL{s,t,p,Wh) = ^dl{Sah,tb^,Wh) 
k^l 

subject to the DTW constraints, where 


dl{Sa,tb,Wh) 


1 

- min 

'^a,b ^a,b,^a,b 


^ ^ d{Sa-\-W)^b-\-w) 


l<a-\-w<n 

l<b-\-w<m 


and = Ca,btb+w + ea,b- Dl{s, t,p*, wt) is referred to as 
the LARDTW difference measure. 
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The minimizing (c* {,, e* can be obtained using the follow¬ 
ing equations for each pair of matched points (sa, 4): 

Pa^h r0a,fe^a,6 1 

^a ,6 ~ i 2 ’ ~ ~ ^a,b'^CL,b) 

where 

Wh 

Pa,b ^ ^ ^a-\-wib-\-w 


W — -Wh 
l<a-\-w<n 
l<b-\-w<m 



'Wh 

Wh 

4^a,b 

— ^ ^ ^a-\-wj'^a,b — 

^ ^ ^h-\-w 


W — -Wh 

W — -Wh 


l<a-\-wKn 

l<a-\-wKn 


l<b-\-w<m 

l<b-\-w<m 


Wh 

Wh 

la,b 


Y. ^b+w 


W — — Wh 

W — — Wh 


l<a+LU<n 

l<a+iu<n 


l<b-\-w<m 

l<b-\-w<m 


Similar to ADTW, the scaling & can be constrained to 
exist between c^in and c^ax to avoid improbable scalings. 
Dynamic programming can again be utilized in the same 
manner as DTW, where the update formula for constructing 
the LARDTW table is as follows: 

DL{s,t,pl^,,^,Wh) 

= di{sa,tb,Wh) +mm{DL{s,t,pl^_^f,_^^,Wh), 

Dl{s, t,pl^b-l)^Wh),DL{s, t,pl^_^i^yWh)) 

The same backtracking technique used for DTW is applied 
to the LARDTW table to obtain p*. LARDTW is illustrated 
in Figure]^ where its alignment is visually more appropriate 
than that of both ADTW and RDTW, and this result is 
attributed to LARDTW's ability to model different scalings 
for different regions. 


A. ADTW on s ar>d I B. RDTW on s and t C. LARDTW on s and t 



Fig. 6: ADTW, RDTW and LARDTW on time series with 
different component scalings and widths. 

Assuming n = m, the LARDTW table requires 0{wbn) 
elements to be filled with the update formula. Most ele¬ 
ments in the LARDTW table can be updated with a time 
complexity of 0(1) instead of 0{wr) using the following 
observations: 

Pa^b — Pa—1,6—1 “t“ Sa^wh^b+Wh ^a—Wh — l^b—Wh — 1 

and ( 7 a, 6, Ta^b, 4 ’a,b, Va,b) Can be updated in a similar manner 
when a,b> 1. Thus, (ca,6, ea,b) can be updated in 0(1) time. 
Furthermore, 

dl (Sa, tb^ rC/i) — [Pa,6 ^^a 6Pa,6 

Wa,b 

2elbKb + Kb?la,b + K,b<,bra,b + WaA<,bf] 

SO di{sa,tb,Wh) can be updated in 0(1) time when a,6 > 1. 
The time and space complexities for LARDTW are hence 
analyzed to be 0{wbn). 


3 Evaluation and Discussion 

3.1 Parameter Values of the Proposed Methods 

The proposed methods of ADTW, RDTW, GARDTW and 
LARDTW introduce additional parameters to DTW's band¬ 
width parameter Wb- What these parameters are set to or 
how they are set for evaluation are detailed in Table and 
these parameters are described below. Recall that the Sakoe- 
Chiba bandwidth Wb is l + 2wq, and n is the typical length of 
a time series in the evaluated dataset. As a result, — reflects 
the bandwidth Wb- Similarly, the region width iCr is 1 -F 2wh, 
so ^ reflects the region width Wr used by RDTW and 
methods that augment it. Dstop is a parameter related to the 
stopping condition for the ADTW and GARDTW algorithms 
shown in Algorithm and (cinin,Cmax) exists to avoid 
improbable scalings for ADTW, GARDTW and LARDTW. 
Note that each of these parameters has been defined in 
the previous sections. Unless mentioned otherwise, for all 
completed evaluations, parameter values for DTW and our 
proposed methods were set as reported in Table 


3.2 Alignment Evaluation 

The alignments produced by the proposed methods were 
evaluated by comparing them with true alignments gen¬ 
erated through simulations. Two different alignment simu¬ 
lation and evaluation approaches are explored. For global 
affine simulation, simulated temporal variations, scalings 
and offsets of real time series were imposed. For component- 
based simulation, varying widths and scalings were im¬ 
posed on simulated components, and these components 
were superimposed with varying temporal offsets to create 
a set of time series for alignment evaluation. 

3.2.1 Global Affine Simulation 

For global affine simulation, given a real time series s, a 
temporal variant of s, tjj = ('0i,..., ^z), was created by 
defining a warping function w = (a;(l),..., a;( 2 )) where 
uj{i) S n}, and setting il^i = Saj(i). The sequence 

w was constrained to be monotonic to maintain a similar 
structure in ij), and it was modeled as a random sequence in 
the following manner: 

{ a;(f) -F 1 with probability Pmatch 
uj{i) -F 2 with probability Pdeiete 
ui{i) with probability Pmsen 

where Tmatch “F Tkeiete ~F Tjasert — 1/ 6c(l) — 1, and this 
sequence ends when u){z -F 1) > n. The true alignment 
p'- = {(a\, &\), ..., (a(„, bl^)} between s and ij; can be con¬ 
structed with {af,b)) = (w(j))j)- Interpolation of V' was 
completed to have the same length as s and p* was modified 
accordingly for ease of evaluation. Additional scalings and 
offsets were imposed on p = -F e, where c and e 
were uniformly distributed with [cmin,Cmax] and [emin,emax] 
respectively. Gaussian white noise with a standard deviation 
of Unoise was also imposed. A slight variant of the measure 
introduced in 0 was used to evaluate an alignment p on s 








































TABLE 1: Parameters and their assigned values of proposed methods used for evaluation. 
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Parameter 

Value 

Related methods 

UJq 

Tuned across {0, 0.05,.. 

.,0.5} 

DTW, ADTW, RDTW, GARDTW, LARDTW 

'^h 

n. 

Tuned across {0.05,0.1, 

..,0.5} 

RDTW, GARDTW, LARDTW 

-^stop 

10"^ 

ADTW, GARDTW 

(Cmin 1 Cmax) 

(0.2,5) 

ADTW, GARDTW, LARDTW 


and t ^ N{ii, Unoise-fn); where /„ is an identity matrix. The 
variant is as follows: 


Mg(p\p) = 


^n{n — 1 ) 


E E 


min \b)-bi\ 

bi:ai=i ■' 


where better alignments have lower Mg values. 

Global affine simulations were based on 3 datasets taken 
from ||^. These datasets include the number of deaths across 
different ages in 1 year, the position of the lower lip when 
saying a certain word, and the temperature across 365 days. 
All aforementioned datasets are subject to temporal vari¬ 
ations, scalings and offsets. For evaluation, 10 time series 
were taken from each dataset, and 10 time-distorted and 
affine versions were created for each time series. Let a be 
the standard deviation of all time series in a specific dataset. 
Then, 0.6, f^nsert 0.2, Cj^ln ^min 0.2, 

Omax — On\ax — 6, CiYiin — — (j and fTnQ[gg — TiiCT, 

where ni is defined to be the noise level. An alignment 
measure Mg was obtained for each alignment method and it 
was averaged within each dataset to get a dataset score. To 
obtain an appropriate bandwidth and region width, ^ and 
^ were tuned according to Table based on the dataset 
score. The 3 dataset scores were then averaged to obtain 
up. 

M|''® values with different methods and noise levels are 
plotted in Figure When the noise level is low, ADTW 
outperforms other methods because it accounts for tem¬ 
poral variations, global scalings and global offsets simul¬ 
taneously without regional emphasis. GARDTW is second 
behind ADTW when the noise level is low, but it starts 
to outperform ADTW as the noise level increases because 
regional emphasis can have an averaging effect in terms of 
the alignment. 


D 
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^ 0.012 


0) 

E 0.008 
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^ 0.006 
< 



0.1 0.2 0.3 

Noise level n, 


Fig. 7: Alignment measure M|''® of proposed methods across 
different noise levels. 


3.2.2 Component-Based Simulation 

Each time series was comprised of the superposition of 
different components with varying widths and scalings at 


different locations for component-based simulation. Let s 
and t be two simulated component-based time series of 
length n. Also, let ric be the number of components within a 
time series. Then, s and t were simulated as follows: 


= E t = E 

i=i i=i 


o' 




Zj) 


where w, z) is a component of type z centered at loca¬ 
tion i with width w, and a is the associated scaling factor. 
Different component types are associated with different 
windows commonly used in spectral analysis, where z = 
1,2, 3,4 denote a Parzen, rectangular, triangular, and flat 
top weighted window respectively. Zj and ) were 

all generated from discrete uniform distributions with pa¬ 
rameters (Zmin,^max) = (1,4) and (Wmin, Wmax) = (5^, ^)- 
) were also generated from a discrete uniform dis¬ 
tribution with parameters (fmin,*max) = albeit with 

an additional constraint that the chronological order of 
components in s is the same in t. In other words, it was 
assumed that correct alignments can only proceed forward 
in time as dictated by DTW's monotonicity constraint. In 
addition, were generated from a folded normal 

distribution with parameters {pa = 1, 


To produce a true alignment, only points associated with 
components were considered. The true alignment = 
{(a\, &\),..., (ajj, 6(j)} between s and t can be broken down 
into two parts: non-overlapping and overlapping. For sec¬ 
tions that do not have any overlap of components, ob¬ 
taining the true alignment is straightforward because it 
is exactly known how to match one component in s to 
the corresponding component in t during synthesis of the 
component-based time series. However, confusion arises 
when component overlap exists, and it was decided to 
map an overlapped point to the component whose center 
is closest to the overlapped point, because all simulated 
components were symmetric and clearly identifiable close 
to the center. The component-based evaluation measure Me 
is a slight variation of the measure Mg used for global affine 
simulation: 


Mc(p\p) 


1 

— 1 ) 


min 16* 

^ bi-.ai=i ' ^ 

Vi belonging :a‘- —i 

to a component 


bi\ 


The evaluated Me values were averaged across 100 simu¬ 
lated pairs of s and t where n = 400 and ric = 4, and the re¬ 
sulting M”® scores are displayed for each alignment method 
across different Ua values in Figure For this eweriment, 
^ = 0.5 and ^ were tuned according to Table [l based on 
M®''®. It can be observed that methods based on lS)TW con¬ 
sistently outperformed DTW and ADTW, because RDTW 
has a regional emphasis. Furthermore, as the same compo¬ 
nent is subject to higher variations in amplitude, LARDTW 
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outperforms other methods by larger amounts because each 
region potentially reflective of a component can be scaled 
differently in LARDTW. 


ra 

0) 

E 


c 

(U 

E 

c 

O) 

< 


‘ DTW 

O ADTW 
□ RDTW 
» GARDTW 
• LARDTW 

o ■ 

O 

1 

0 

S 

^ o 

□ e ° 

X 

■ • 

• ‘ 


0 05 1 1.5 2 

Scaling standard deviation 


Fig. 8: Component-based alignment measure M®''® of pro¬ 
posed methods across different variances in scaling. 


3.3 Difference Measure Evaluation 

The difference measures associated with the proposed align¬ 
ment methods were evaluated on 44 datasets from the 
UCR time series database using the one-nearest-neighbor 
(1-NN) error rate. To obtain an appropriate bandwidth 
and region width for each method (DTW, ADTW, RDTW, 
GARDTW and LARDTW) and dataset for testing, and 
^ were tuned according to Tablej^based on the 2-fold strat¬ 
ified cross-validation error rate on the training set. 

DTW is compared against the proposed difference measures 
using the 1-NN error rate in Figure and clear improve¬ 
ments can be observed for the proposed measures on spe¬ 
cific datasets. It is unsurprising that DTW outperformed the 
proposed measures on certain datasets, because there are 
datasets (e.g. SyntheticControl) where scaling/offset differ¬ 
ences and global trends (exact opposites of affine and re¬ 
gional properties) are important for discrimination. 


ADTW «rror rsM vs DTW •rror rate RDTW srror rats vt DTW error rats 



GARDTW error rale vs DTW error eate LARDTW error rate vs DTW error rate 



Fig. 9: 1-NN error rates of proposed different measures 
against DTW on the UCR database. Each point denotes a 
dataset. 

The win-loss ratios of 1-NN with proposed difference mea¬ 
sures against other state-of-the-art elastic difference mea¬ 
sures are presented in Table|^ where a tie contributes 0.5 to 


both the number of wins and the number of losses. We call 
these compared difference measures elastic because distinct 
alignments with different properties are also generated in 
the process of computing these difference measures. It is not 
unreasonable to expect that the classification performance of 
the respective difference measures can reflect the quality of 
their respective alignments. The compared state-of-the-art 
elastic difference measures include weighted DTW (WDTW) 
ig, derivative DTW (DDTW) 0, weighted derivative 
DTW (WDDTW) (T^ , longest common subsequence (LCSS) 
11 , move-split-merge (MSM) [T^ , time warp edit (TWE) 
13 and edit distance with real penalty (ERF) (T4| . The eval¬ 
uated results of these compared elastic difference measures 
on the UCR database were taken from p5) , and there are 43 
datasets that overlap with the evaluation of our proposed 
methods. Table shows that the RDTW and GARDTW 
difference measures outperformed DTW with greater than 
2 win-loss ratios. While ADTW and LARDTW do not seem 
to offer particular advantages over DTW from a win-loss 
ratio perspective, we will later demonstrate that they of¬ 
fer specialized advantages among the compared difference 
measures. Eurthermore, 1-NN with the proposed difference 
measures is also competitive with the state-of-the-art elastic 
difference measures as demonstrated by the associated win- 
loss ratios. 

Among the 43 overlapping datasets evaluated by all com¬ 
pared elastic difference measures, DTW, ADTW, RDTW, 
GARDTW, LARDTW, WDTW, DDTW, WDDTW, LCSS, 
MSM, TWE and ERF are each best for 4, 3, 2, 3, 8, 4, 3, 
6, 2, 5, 2 and 1 datasets respectively. This suggests that 
specialized advantages exist for each proposed difference 
measure on specific datasets even among current state-of- 
the-art elastic difference measures. To reiterate, among all 
evaluated measures and datasets in Table 0 ADTW was 
found to be best for 3 datasets, RDTW was found to be 
best for 2 datasets, GARDTW was found to be best for 3 
datasets, and LARDTW was found to be best for 8 datasets 
among the 43 datasets. An ensemble classifier based on 
the compared elastic difference measures (WDTW, DDTW, 
WDDTW, LCSS, MSM, TWE and ERF) has been demon¬ 
strated to be the most accurate time series classifier ever 
proposed in the data mining literature ||^|. Considering that 
our proposed difference measures are best for 16 out of the 
43 evaluated datasets among the compared elastic difference 
measures, it is not unreasonable to expect even better results 
from incorporating our proposed difference measures into 
the aforementioned ensemble classifier. 


The average performance rankings for 1-NN for the pro¬ 
posed difference measures and the compared elastic differ¬ 
ence measures are illustrated in Figure [TO] where a lower 
rank corresponds to a more accurate classifier. In this sce¬ 
nario, a total of 12 different elastic difference measures were 
compared. For each dataset, each compared difference mea¬ 
sure was given a rank from 1 to 12 based on its associated 
1-NN error rate. Finally, for each method, the computed 
ranks across the 43 UCR datasets were averaged to produce 
the aforementioned average performance ranking. RDTW 
and GARDTW clearly have lower average ranks than all 
other methods. However, it should also be noted that there 
is no significant statistical difference between any of our 














TABLE 2: Win-loss ratios of 1-NN with proposed difference measures against other state-of-the-art elastic difference 
measures on the UCR database. 


State-of-the-art elastic difference measures 

Proposed elastic difference measures 

ADTW 

RDTW 

GARDTW 

LARDTW 

DTW 

1.1 (15,16,12) 

2.1 (25,8,10) 

2.2 (26,7,10) 

1.1 (22,2,19) 

WDTW 

0.7 (16,3,24) 

1.7 (25,4,14) 

2.0 (27,3,13) 

1.0 (20,2,21) 

DDTW 

1.9 (28,0,15) 

2.9 (32,0,11) 

2.6 (31,0,12) 

3.3 (33,0,10) 

WDDTW 

1.3 (24,1,18) 

2.0 (28,1,14) 

2.4 (30,1,12) 

2.7(31,1,11) 

Less 

1.5 (25,1,17) 

3.8 (34,0,9) 

3.1 (32,1,10) 

1.3 (23,2,18) 

MSM 

1.0 (22,0,21) 

0.8 (19,0,24) 

1.1 (23,0,20) 

0.9 (20,0,23) 

TWE 

1.3 (23,2,18) 

1.7 (27,0,16) 

2.0 (28,1,14) 

1.0 (21,1,21) 

ERP 

1.4 (24,2,17) 

3.5 (33,1,9) 

2.6 (30,2,11) 

1.5 (25,1,17) 


*The number of wins, ties and losses are shown in this order within the brackets against the compared methods. 


proposed methods and most of the compared elastic differ¬ 
ence measures based on the Friedman rank test as described 
in |[^|. In Figure 10 two elastic difference measures are 
significantly different in rank based on the associated 1-NN 
error rate if the absolute difference of their average ranks 
exceeds the critical difference of 2.356. 


cnt»cal diffefence « 2.356 



Figure]^ It is also important to note that ADTW motivated 
the development of GARDTW and LARDTW. 



B ADTW 





% 

C. Normalize before DTW 
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Fig. 11: ADTW vs normalization before DTW for global 
affine simulation and illustration of their differences. 


3.5 Analysis of Region Width 


Fig. 10: Average ranks of 1-NN for proposed difference 
measures and compared elastic difference measures. 


3.4 ADTW vs Normalization Before DTW 

The difference between ADTW and normalization before 
DTW might not be clear thus far. Normalization before DTW 
finds the scaling and offset regardless of the alignment, 
whereas ADTW considers the alignment to find an appro¬ 
priate scaling and offset in an iterative manner and up¬ 
dates the alignment accordingly. In Figure [TT[4, ADTW and 
normalization before DTW are compared across different 
warping probabilities = 2Pdeiete = 2Pinsert based on the 
global affine simulation (described in Section 3.1.1), where 
ADTW outperforms normalization before DTW by larger 
amounts as the warping level increases. The difference 
between their output alignments is also illustrated in Figure 
[TT| 3 and C, and ADTW's alignment is closer to the true one. 
We are by no means claiming that ADTW is better than 
normalization before DTW in general, especially since there 
is no evidence that ADTW-based classification is better than 
using DTW for classification based on the win-loss ratios for 
real datasets in Table However, what we do want to point 
out is that ADTW can offer distinctly different alignments 
from normalization before DTW, because ADTW has the 
unique property where the alignment, scaling and offset are 
all dependent on each other. This property might be useful 
when large amounts of warpings occur, as suggested in 


Figure demonstrates that region width can be a crucial 
parameter for RDTW. Figure [T^demonstrates the sensitivity 
of the 1-NN classification accuracy to the region width 
for the UCR datasets studied. From Figure 


12 


it can be 

seen that the 1-NN RDTW classification accuracy is very 
sensitive to the region width for some datasets, whereas 
it is not sensitive to the region width for other datasets. 
Furthermore, the tuned region width is not arbitrary. In 
FastShapelet provided the lowest error rate ever recorded 
for the ECGFiveDays dataset by extracting a subsequence 
(delayed t-wave) confirmed to be discriminative by a med¬ 
ical expert. Interestingly, the tuned region width for RDTW 
is roughly equal to the length of this subsequence, and 
LARDTW offers an even better error rate of 0 by appropri¬ 
ately handling the wandering baseline. In | |l^ |, time series 
extracted from leaf images were presented as a motivating 
example for a complexity-invariant measure. Associated 
complexities manifest locally, and the tuned region width 
for such time series (e.g. the OSULeaf dataset) is small, 
thereby emphasizing these local differences. In this work, 1- 
NN using LARDTW with a small region width outperforms 
the complexity-invariant measure for such data. 


3.6 Actual Runtime Comparison 

In the previous Alignment Methodology section, the run¬ 
time complexity of DTW, ADTW, RDTW, GARDTW and 
LARDTW has been discussed. DTW, RDTW and LARDTW 
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Fig. 12: Range of 1-NN RDTW error rates across different 
region widths on each UCR dataset. 


share the time complexity of 0{wbn), whereas ADTW and 
GARDTW share the time complexity of 0{ncWbn). Recall 
that Wb is the Sakoe-Chiba bandwidth, ric is the number 
of iterations required for convergence for the ADTW and 
GARDTW algorithms and n is the length of the time series 
to find an alignment for. 

In Figure]^ the average time taken to produce an alignment 
for each proposed method (ADTW, RDTW, GARDTW and 
LARDTW) is compared against DTW. Each point in Figure 
denotes a dataset from the UCR database. Note that 
each dataset from the UCR database has a fixed length. 
The average pairwise alignment times were calculated by 
randomly selecting 20 time series from each UCR dataset 
for 10 computations. The parameter values used in this 
experiment largely follows Table hi with ^ and ^ being 
the exceptions. ^ and ^ are both set to 0.2, because most 
of the ^ and ^ values tuned on the training sets of the 
UCR database based on the 1-NN error rate do not exceed 
0.2. 

In Figure]^ we can observe that RDTW has almost identical 
actual computation time to DTW, but ADTW, GARDTW 
and LARDTW are visibly slower than DTW. The actual 
alignment times for LARDTW and DTW does not differ 
by more than a constant multiple of 0.5. Both ADTW and 
GARDTW can be dramatically slower than DTW with more 
than 5-fold difference for larger datasets because more it¬ 
erations are required for convergence (reflected in ric). This 
set of results suggest that ADTW and GARDTW have high 
relative computation costs when applied to larger datasets, 
whereas RDTW and LARDTW have computation times 
comparable to DTW for larger datasets. Nonetheless, ADTW 
and GARDTW could offer specific advantages for smaller 
datasets. 



Time sefiee length Time senes length 


Lig. 13: Actual time taken to compute the proposed dif¬ 
ference measures against DTW on the UCR database. The 
average pairwise alignment times were calculated by ran¬ 
domly selecting 20 time series from each UCR dataset for 10 
computations. 


Appendix A 

GARDTW Affine Equations 


Setting p = p^, then 
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4 Conclusion 

ADTW, RDTW, GARDTW and LARDTW are alignment 
methods whose models include affine invariance and re¬ 
gional emphasis. If they are applied to problems whose 
underlying models are similar to the models behind the 
methods, the proposed DTW variants are expected to pro¬ 
vide performance gains as demonstrated in this work on 
simulated models and real datasets. 
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